Efficient Nearest Neighbor Indexing Based on a Collection of Space Filling Curves

نویسنده

  • Nimrod Megiddo
چکیده

A database is populated with a set of points represented by n-tuples of real numbers. A query consists of a point q (not necessarily in the database) and an integer k, asking for the k "nearest" database points to the query point. The exact output for the query consists of the k nearest points, but if the database is large and a quick response is required, a good approximate output is sought. All currently known methods require at query time calculation of distances fiom the query point to many database points. The computational effort is dominated by the number of such distance calculations since points have to be fetched fiom random locations in the database and the high dimension implies that current database indexes cannot significantly restrict the number of points that have to be fetched. In many cases, a complete linear scan of the database beats the currently known methods. The number of such distance calculations performed by current methods grows with the number of points in the database. The method described in this report has shown (in experiments on databases with tens of thousands of points with hundreds of dimensions, and asking for about 100 nearest neighbors) to provide very good approximate output sets while limiting the number of distance calculations to a few hundreds. Theoretical analysis predicts that the number of required distance calculations depends on the dimension and not on the number of points in the database. Therefore, when more points are added to a database the dominant factor in the query effort does not change. Efficient Nearest Neighbor Indexing Based on a Collection of Space Filling Curves Nimrod Megiddo and Uri Shaft IBM Almaden Research Center October 20, 1997

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Neighbor-finding based on space-filling curves

Nearest neighbor-finding is one of the most important spatial operations in the field of spatial data structures concerned with proximity. Because the goal of the space-filling curves is to preserve the spatial proximity, the nearest neighbor queries can be handled by these space-filling curves. When data is ordered by the Peano curve, we can directly compute the sequence numbers of the neighbo...

متن کامل

Fast k-NN classification rule using metric on space-filling curves

A fast nearest neighbor algorithm for pattern classiication is proposed and tested on real data. The patterns (points in d-dimensional Euclidean space) are sorted along a space-lling curve. This way the multidi-mensional problem is compressed to the simplest case of the nearest neighbor search in one dimension.

متن کامل

Fast indexing method for multidimensional nearest-neighbor search

This paper describes a snapshot of work in progress on the development of an eecient le-access method for similarity searching in high-dimensional vector spaces. This method has applications in, for example, image databases where images are accessed via high-dimensional feature vectors. The technique is based on using a collection of space-lling curves as an auxiliary indexing structure. Initia...

متن کامل

3D Hilbert Space Filling Curves in 3D City Modeling for Faster Spatial Queries

The advantages of three dimensional (3D) city models can be seen in various applications including photogrammetry, urban and regional planning, computer games, etc. They expand the visualization and analysis capabilities of Geographic Information Systems on cities, and they can be developed using web standards. However, these 3D city models consume much more storage compared to two dimensional ...

متن کامل

Decreasing Radius K-Nearest Neighbor Search using Mapping-based Indexing Schemes

A decreasing radius k-nearest neighbor search algorithm for the mapping-based indexing schemes is presented. We implement the algorithm in the Pyramid technique and the iMinMax(θ), respectively. The Pyramid technique divides d-dimensional data space into 2d pyramids, and the iMinMax(θ) divides the data points into d partitions. Given a query point q, we initialize the radius of a range query to...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997